Rmsnorm #136

lee2716 · 2025-11-25T11:33:48Z

Describe the intent of your PR here.

This PR adds support for RMSNorm (Root Mean Square Normalization) operation to the Deeploy framework's Generic platform. RMSNorm is a critical normalization technique used in modern Transformer architectures and large language models. To enable RMSNorm deployment on embedded systems, this PR implements the necessary mathematical primitives (Pow and Sqrt operations) and integrates them into Deeploy's compilation pipeline.

The implementation follows Deeploy's operator decomposition approach, where RMSNorm is constructed from basic mathematical operations rather than as a monolithic kernel. This design provides flexibility and maintainability while supporting both float32 and float16 precision for resource-constrained embedded devices.

Added

Pow (Power) operation support
- FloatPowTemplate.py: Mako template for C code generation
- Pow_fp32.c Kernel implementations for both precisions
- kernel/Pow.h: Kernel interface definitions
- Parser, Layer, and Binding classes for framework integration
Sqrt (Square Root) operation support
- FloatSqrtTemplate.py: Mako template for C code generation
- Sqrt_fp32.c : Kernel implementations
- kernel/Sqrt.h: Kernel interface definitions
- Complete framework integration components
Comprehensive test suites
- testFloatPow : Pow operator tests with ONNX models and reference data
- testFloatSqrt : Sqrt operator tests
- testFloatRMSNorm: End-to-end RMSNorm tests demonstrating operator composition

Changed

Framework integration files
- Deeploy/Targets/Generic/Parsers.py: Added PowParser and SqrtParser for ONNX graph parsing
- Deeploy/Targets/Generic/Layers.py: Added corresponding Layer classes for both operations
- Deeploy/Targets/Generic/Bindings.py: Added type checking and binding registration
- Deeploy/Targets/Generic/Platform.py: Registered new operations in platform mapping
Runtime library headers
- TargetLibraries/Generic/inc/DeeployBasicMath.h: Extended with Pow and Sqrt function declarations
- TargetLibraries/Generic/inc/types.h: Updated type definitions for consistency
CI/CD configuration
- .github/workflows/ci-platform-generic.yml: Updated to include new test cases in automated testing pipeline

Fixed

N/A (This is a feature addition PR with no bug fixes)

PR Merge Checklist

The PR is rebased on the latest devel commit and pointing to devel.
Your PR reviewed and approved.
All checks are passing.
The CHANGELOG.md file has been updated.
If the docker was modified, change back its link after review.

…16 for Generic

coderabbitai · 2025-11-25T11:43:56Z

📝 Walkthrough

Summary by CodeRabbit

New Features
- Added Pow operation (power function with scalar and vector exponent support) on the Generic platform
- Added Sqrt operation (square root function) on the Generic platform
- Added RMSNorm operation support via operator decomposition
Tests
- Added test cases for Pow scalar and vector variants, Sqrt, and RMSNorm operations

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Walkthrough

Adds Pow and Sqrt support to the Generic target: new parsers, layer types, bindings, templates (scalar vs vector exponent dispatch), C kernel implementations for float32, a test vector, and CI entries for new tests.

Changes

Cohort / File(s)	Summary
CI Configuration `\.github/workflows/ci-platform-generic.yml`	Appends four new generic-kernels tests: `testFloatPowScalar`, `testFloatPowVector`, `testFloatSqrt`, and `testFloatRMSNorm` to the existing `test-names` list.
Parsers `Deeploy/Targets/Generic/Parsers.py`	Adds `PowParser` and `SqrtParser` (node validation and context extraction) and imports `ConstantBuffer`. Note: duplicate/overlapping parser definitions appear in the diff.
Layers `Deeploy/Targets/Generic/Layers.py`	Adds `PowLayer` and `SqrtLayer` as `ONNXLayer` subclasses (constructors only).
Bindings `Deeploy/Targets/Generic/Bindings.py`	Introduces `BasicPowBindings` and `BasicSqrtBindings` (NodeBindings using `FloatPowTemplate.referenceTemplate` and `FloatSqrtTemplate.referenceTemplate`) and updates import/export lists.
Platform Integration `Deeploy/Targets/Generic/Platform.py`	Registers `PowParser`/`SqrtParser`, creates `PowMapper`/`SqrtMapper` wired to the new basic bindings, and maps `'Pow'` -> `PowLayer([PowMapper])`, `'Sqrt'` -> `SqrtLayer([SqrtMapper])`.
Templates `Deeploy/Targets/Generic/Templates/FloatPowTemplate.py`, `Deeploy/Targets/Generic/Templates/FloatSqrtTemplate.py`	Adds `_PowTemplate` with scalar vs vector exponent alignment and `referenceTemplate` selecting appropriate Pow kernel; adds `_SqrtTemplate` and `referenceTemplate` emitting Sqrt kernel call.
Kernel Headers `TargetLibraries/Generic/inc/DeeployBasicMath.h`, `TargetLibraries/Generic/inc/kernel/Pow.h`, `TargetLibraries/Generic/inc/kernel/Sqrt.h`	Adds includes for `kernel/Pow.h` and `kernel/Sqrt.h`; declares `Pow_fp32_fp32_fp32`, `Pow_fp32_scalar_fp32`, and `Sqrt_fp32_fp32` prototypes.
Kernel Implementations `TargetLibraries/Generic/src/Pow_fp32.c`, `TargetLibraries/Generic/src/Sqrt_fp32.c`	Implements element-wise float32 power (array and scalar exponent) using `powf`, and sqrt using `sqrtf`, iterating over `size`.
Tests / Test Vectors `DeeployTest/Tests/testFloatPowVector/network.onnx`	Adds an ONNX-format test vector for a Pow vector operation (data-driven test file).
Repo config `\.gitignore`	Adds `.venv/*` to the ignore list.
Changelog `CHANGELOG.md`	Updates Unreleased section to list RMSNorm, Pow, and Sqrt additions.

Sequence Diagram(s)

mermaid
sequenceDiagram
participant ONNX as ONNX Node
participant Parser as PowParser / SqrtParser
participant Platform as Mapper/Platform
participant Template as FloatPowTemplate / FloatSqrtTemplate
participant Kernel as C kernel (Pow/Sqrt)
Note over ONNX,Parser: model node arrives
ONNX->>Parser: validate node & extract context (inputs, outputs, shapes)
Parser->>Platform: produce mapper + operatorRepresentation
Platform->>Template: alignToContext(operatorRepresentation)
Template-->>Platform: operatorRepresentation (is_scalar, size, args)
Platform->>Kernel: call selected kernel function with args
Kernel-->>Platform: compute results (data_out)
Platform-->>ONNX: mapped node integrated into generated code/runtime

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Inspect Deeploy/Targets/Generic/Parsers.py for duplicate/overlapping PowParser/SqrtParser definitions (likely merge artifact).
Verify scalar-vs-vector detection and size-consistency checks in FloatPowTemplate.py.
Confirm Platform.py mapper wiring and exports for new layers/parsers/bindings.
Review C kernels (Pow_fp32.c, Sqrt_fp32.c) for includes, restrict usage, loop bounds, and correct math functions.

Possibly related PRs

Split CI Workflows by Platform and Task, Improve Formatting and Linting Reliability #108 — Related CI workflow entries; appears connected to additions in .github/workflows/ci-platform-generic.yml.

Suggested labels

Feature

Suggested reviewers

Victor-Jung
Xeratec
lukamac

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 8.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The PR title 'Rmsnorm' is only partially related to the changeset. While RMSNorm is mentioned in the PR description as the overall goal, the actual changes focus significantly on adding Pow and Sqrt operations as building blocks, which represent substantial portions of the implementation.
Description check	✅ Passed	The PR description is comprehensive and directly related to the changeset, detailing RMSNorm support, Pow and Sqrt operations, framework integration, and testing. It clearly describes the implementation approach and scope.

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e1785a8 and 2e857bd.

📒 Files selected for processing (1)

CHANGELOG.md (2 hunks)

🔇 Additional comments (1)

CHANGELOG.md (1)

7-7: Changelog entries accurately capture the PR's contributions. ✓

The PR reference and feature descriptions clearly summarize the addition of RMSNorm support via Pow and Sqrt operators, with appropriate detail on platform-specific implementation (parsers, layers, bindings, templates, FP32 kernels). The entries are well-placed in both the "List of Pull Requests" and "Added" sections and follow the existing format.

Also applies to: 30-31

Tip

✨ Issue Enrichment is now available for GitHub issues!

CodeRabbit can now help you manage issues more effectively:

Duplicate Detection — Identify similar or duplicate issues
Related Issues & PRs — Find relevant issues and PR's from your repository
Suggested Assignees — Find the best person to work on the issue
Implementation Planning — Generate detailed coding plans for engineers and agents

Disable automatic issue enrichment

To disable automatic issue enrichment, add the following to your .coderabbit.yaml:

issue_enrichment:
  auto_enrich:
    enabled: false

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (6)

Deeploy/DeeployTypes.py (1)

327-349: Docstring matches behavior; minor cleanup possible in visited set

The updated “live ancestors” wording matches the breadth‑first walk over the alias graph and better explains what’s being checked. One small implementation nit: visited = set(self.name) builds a set of characters rather than a set of buffer names; using {self.name} would make the intent clearer and avoid mixing types in visited, even though it doesn’t currently break correctness.

TargetLibraries/Generic/src/Sqrt_fp32.c (1)

1-13: Elementwise fp32 sqrt kernel looks correct

The Sqrt_fp32_fp32 implementation is straightforward and type‑consistent with float32_t/int32_t, doing an elementwise sqrtf over the input range. Assuming sqrtf is declared via the transitive includes from DeeployBasicMath.h, there are no correctness issues here.

TargetLibraries/Generic/src/Pow_fp16.c (1)

1-26: Pow_fp16 implementation is correct for integer exponents; consider faster exponentiation

The kernel correctly handles zero and negative integer exponents and writes elementwise base^exponent into data_out. For typical small exponents this is fine, but the linear for (j = 0; j < exp; j++) loop makes runtime proportional to |exponent|. If you expect larger exponents or care about worst‑case latency, consider switching to exponentiation‑by‑squaring on a promoted float accumulator for better performance and numerical behavior, while preserving the float16_t I/O interface.

Deeploy/Targets/Generic/Layers.py (1)

230-240: PowLayer/SqrtLayer wiring is minimal and consistent with existing layers

The new PowLayer and SqrtLayer classes correctly follow the existing pattern of thin ONNXLayer wrappers around mappers. For current usage this is sufficient. If accurate op‑count reporting or explicit broadcasting for Pow becomes important, you may later want to override computeOps (e.g., proportional to tensor size) and, if needed, computeShapes similar to AddLayer/MulLayer.
Deeploy/Targets/Generic/Parsers.py (1)
1967-2001: Duplicate PowParser/SqrtParser definitions and mismatched exponent field

There are two separate definitions of PowParser and SqrtParser in this file: one here and another at lines 2813–2869. The later definitions override these ones at import time, so this block is effectively dead code and also:

Triggers lints (PowParser/SqrtParser redefinition, undefined ConstantBuffer on Line 1990).

Uses exponent_value instead of exponent, which doesn’t match FloatPowTemplate.alignToContext, where exponent is expected and exponent_value is derived there.

To avoid confusion and static-analysis noise, I’d consolidate to a single implementation (the newer one) and delete this earlier block entirely. A minimal fix would look like:
-class PowParser(NodeParser):
-    ...
-
-
-class SqrtParser(NodeParser):
-    ...
-
leaving only the final PowParser/SqrtParser definitions at the bottom of the file.

Also applies to: 2003-2023
Deeploy/Targets/Generic/Templates/FloatSqrtTemplate.py (1)

14-28: FloatSqrtTemplate matches kernels; consider removing unused data_out

The template and alignToContext correctly:

Infer data_type from data_in and

Dispatch to Sqrt_fp32_fp32 / Sqrt_fp16_fp16 with the right arguments.

The only nit is that data_out = ctxt.lookup(operatorRepresentation['data_out']) is never used in alignToContext; you can safely drop that line to quiet Ruff and keep the function minimal.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e07cd13 and 30cfabb.

📒 Files selected for processing (16)

.github/workflows/ci-platform-generic.yml (1 hunks)
Deeploy/DeeployTypes.py (3 hunks)
Deeploy/Targets/Generic/Bindings.py (2 hunks)
Deeploy/Targets/Generic/Layers.py (1 hunks)
Deeploy/Targets/Generic/Parsers.py (2 hunks)
Deeploy/Targets/Generic/Platform.py (3 hunks)
Deeploy/Targets/Generic/Templates/FloatPowTemplate.py (1 hunks)
Deeploy/Targets/Generic/Templates/FloatSqrtTemplate.py (1 hunks)
TargetLibraries/Generic/inc/DeeployBasicMath.h (1 hunks)
TargetLibraries/Generic/inc/kernel/Pow.h (1 hunks)
TargetLibraries/Generic/inc/kernel/Sqrt.h (1 hunks)
TargetLibraries/Generic/inc/types.h (1 hunks)
TargetLibraries/Generic/src/Pow_fp16.c (1 hunks)
TargetLibraries/Generic/src/Pow_fp32.c (1 hunks)
TargetLibraries/Generic/src/Sqrt_fp16.c (1 hunks)
TargetLibraries/Generic/src/Sqrt_fp32.c (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (7)

TargetLibraries/Generic/inc/kernel/Sqrt.h (3)

TargetLibraries/Generic/src/Sqrt_fp32.c (1)

Sqrt_fp32_fp32 (9-13)

DeeployTest/testUtils/dmaUtils.py (1)

size (72-73)

TargetLibraries/Generic/src/Sqrt_fp16.c (1)

Sqrt_fp16_fp16 (9-13)

TargetLibraries/Generic/inc/kernel/Pow.h (2)

TargetLibraries/Generic/src/Pow_fp32.c (1)

Pow_fp32_int32_fp32 (9-27)

TargetLibraries/Generic/src/Pow_fp16.c (1)

Pow_fp16_int32_fp16 (8-26)

Deeploy/Targets/Generic/Layers.py (1)

Deeploy/DeeployTypes.py (2)

ONNXLayer (1819-2147)

NodeMapper (1660-1816)

Deeploy/Targets/Generic/Parsers.py (1)

Deeploy/Targets/Snitch/Parsers.py (3)

parseNode (15-26)

parseNodeCtxt (28-42)

parseNodeCtxt (60-74)

Deeploy/Targets/Generic/Templates/FloatPowTemplate.py (3)

Deeploy/DeeployTypes.py (2)

NetworkContext (508-1020)

NodeTemplate (87-229)

Deeploy/Targets/Generic/Templates/FloatSqrtTemplate.py (1)

alignToContext (14-28)

Deeploy/AbstractDataTypes.py (1)

typeName (312-313)

Deeploy/Targets/Generic/Platform.py (3)

Deeploy/Targets/Generic/Layers.py (2)

PowLayer (230-233)

SqrtLayer (236-239)

Deeploy/Targets/Generic/Parsers.py (4)

PowParser (1967-2000)

PowParser (2814-2846)

SqrtParser (2003-2023)

SqrtParser (2849-2869)

Deeploy/DeeployTypes.py (1)

NodeMapper (1660-1816)

Deeploy/Targets/Generic/Bindings.py (2)

Deeploy/CommonExtensions/DataTypes.py (2)

float16_t (67-71)

float32_t (74-78)

Deeploy/DeeployTypes.py (2)

CodeTransformation (2290-2324)

NodeBinding (1512-1657)

🪛 Ruff (0.14.5)

Deeploy/Targets/Generic/Parsers.py

1978-1978: Unused method argument: channels_first

(ARG002)

1990-1990: Undefined name ConstantBuffer

(F821)

1995-1996: Prefer TypeError exception for invalid type

(TRY004)

1995-1996: Avoid specifying long messages outside the exception class

(TRY003)

2014-2014: Unused method argument: channels_first

(ARG002)

2814-2814: Redefinition of unused PowParser from line 1967

(F811)

2825-2825: Unused method argument: channels_first

(ARG002)

2849-2849: Redefinition of unused SqrtParser from line 2003

(F811)

2860-2860: Unused method argument: channels_first

(ARG002)

Deeploy/Targets/Generic/Templates/FloatSqrtTemplate.py

19-19: Local variable data_out is assigned to but never used

Remove assignment to unused variable data_out

(F841)

Deeploy/Targets/Generic/Templates/FloatPowTemplate.py

19-19: Local variable data_out is assigned to but never used

Remove assignment to unused variable data_out

(F841)

30-30: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (9)

.github/workflows/ci-platform-generic.yml (1)

76-81: New generic-kernels tests are wired correctly in CI list

The added Pow/Sqrt/RMSNorm tests fit the existing naming pattern and placement in the float test block; no workflow logic changes. Please just confirm the test targets are defined under those exact names so CI doesn’t fail on a typo.

Deeploy/DeeployTypes.py (1)

2565-2589: _mapNode refactor cleanly decouples engine selection from layer construction

The new _mapNode that iterates engines and calls engine.Mapping[node.op](node) looks correct and keeps NetworkContainer._bindLayers simpler by returning either an ONNXLayer or side‑effect‑only mappings (e.g., Constant hoisting). The isinstance(layer, ONNXLayer) guard in _bindLayers ensures only real layers are scheduled, while Constant mappings still run via their lambda. This preserves multi‑engine support and fails fast with a clear RuntimeError when no mapping exists.

TargetLibraries/Generic/inc/DeeployBasicMath.h (1)

47-55: Pow/Sqrt kernel includes are correctly integrated

Including kernel/Pow.h and kernel/Sqrt.h alongside the other basic math kernels is consistent and sufficient to expose the new operations to users of DeeployBasicMath.h.

TargetLibraries/Generic/src/Sqrt_fp16.c (1)

1-13: fp16 sqrt kernel is consistent with the fp32 path

Sqrt_fp16_fp16 mirrors the fp32 implementation, applying sqrtf elementwise and relying on the float16_t typedef for the actual storage type. This is a reasonable, simple implementation for FP16 support and aligns with the new type definition.

TargetLibraries/Generic/inc/types.h (1)

13-21: float16_t typedef is sensible and keeps non-FP16 platforms building

Defining float16_t as _Float16 when compiler support is detected, and otherwise aliasing it to float, gives the new Pow/Sqrt FP16 kernels a consistent type while preserving buildability on targets without native FP16. The surrounding comments clearly document this fallback behavior.

TargetLibraries/Generic/inc/kernel/Sqrt.h (1)

1-24: Sqrt kernel header matches implementations

The include guard, DeeployBasicMath dependency, and fp32/fp16 prototypes are consistent with the corresponding C kernels; no issues from a correctness or integration perspective.

TargetLibraries/Generic/inc/kernel/Pow.h (1)

1-25: Pow kernel header is consistent with C implementations

Prototypes and guard are well-formed and match the Pow_fp32/Pow_fp16 C kernels; nothing blocking here.

Deeploy/Targets/Generic/Bindings.py (1)

10-11: Pow/Sqrt bindings are wired consistently with templates and types

The new BasicPowBindings/BasicSqrtBindings correctly:

Use float32_t/float16_t pointer types for inputs/outputs.

Bind to FloatPowTemplate.referenceTemplate and FloatSqrtTemplate.referenceTemplate.

Reuse DummyChecker and BasicTransformer in line with nearby float ops.

Once the Pow parser/template exponent checks are tightened as discussed, these bindings look sound.

Also applies to: 18-22, 121-133

Deeploy/Targets/Generic/Platform.py (1)

10-17: Pow/Sqrt integration into Generic platform is coherent

The new imports, PowMapper/SqrtMapper definitions, and 'Pow'/'Sqrt' entries in GenericMapping line up correctly with:

BasicPowBindings / BasicSqrtBindings,

PowLayer / SqrtLayer, and

The Pow/Sqrt kernels exposed via DeeployBasicMath.h.

Assuming DeeployBasicMath.h now includes the new kernel/Pow.h and kernel/Sqrt.h, the end‑to‑end wiring looks correct.

Please double‑check that DeeployBasicMath.h actually includes the new Pow/Sqrt kernel headers so generated code has the necessary prototypes.

Also applies to: 20-22, 27-29, 56-57, 104-105

Deeploy/Targets/Generic/Parsers.py

Deeploy/Targets/Generic/Templates/FloatPowTemplate.py

TargetLibraries/Generic/src/Pow_fp32.c

The current implementation does not support true FP16 arithmetic. Instead, data is cast to FP32 internally. These tests are being removed to avoid misleading results until native half-precision support is implemented.

diaconuccalin

Good job overall, but there are some changes that we need to address.

Most of them concern the following:

Remove integer exponent enforcement for Pow
Remove constant exponent enforcement for Pow
Remove all traces of FP16 version for Generic, since the compiler for this platform doesn't support this format (as we talked privately, we will use it directly in Snitch, since here it would only help us create the proper infrastructure, like binding and parser, but we've already done it with FP32)

Deeploy/Targets/Generic/Parsers.py

Deeploy/DeeployTypes.py

TargetLibraries/Generic/inc/kernel/Pow.h

TargetLibraries/Generic/inc/kernel/Sqrt.h

Deeploy/Targets/Generic/Templates/FloatPowTemplate.py

Deeploy/Targets/Generic/Parsers.py

TargetLibraries/Generic/inc/kernel/Pow.h

TargetLibraries/Generic/src/Pow_fp32.c

DeeployTest/Tests/testFloatRMSNorm/inputs.npz

This commit addresses code review feedback: - Refactor Pow kernel to use 'powf' from math.h to support floating-point exponents. - Update PowParser to allow tensor exponents instead of forcing constants. - Remove Generic FP16 support and revert types.h changes. - Remove duplicate PowParser/SqrtParser classes. - Enhance RMSNorm tests with larger shapes and non-trivial weights.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (3)

Deeploy/Targets/Generic/Templates/FloatSqrtTemplate.py (1)
12-13: Remove unused variable lookup.

The data_out variable is looked up but never used in alignToContext. Since the template only needs data_in to infer the data type and compute size, you can safely remove this line.

Apply this diff:
     # Get input and output tensors
     data_in = ctxt.lookup(operatorRepresentation['data_in'])
-    data_out = ctxt.lookup(operatorRepresentation['data_out'])
     
     # Get data type (fp32)
Deeploy/Targets/Generic/Templates/FloatPowTemplate.py (1)
12-14: Remove unused variable lookup.

The data_out variable is looked up but never used in alignToContext. The method only needs data_in and exponent to infer the data type and compute sizes.

Apply this diff:
     # Get input and output tensors
     data_in = ctxt.lookup(operatorRepresentation['data_in'])
     exponent = ctxt.lookup(operatorRepresentation['exponent'])
-    data_out = ctxt.lookup(operatorRepresentation['data_out'])
     
     # Get data type (fp32)
Deeploy/Targets/Generic/Bindings.py (1)

121-129: Consider more specific type checkers for Pow and Sqrt.

The bindings use DummyChecker which provides minimal type validation. While this may be intentional for flexibility, you might want to define dedicated PowChecker and SqrtChecker classes (similar to AddChecker, MulChecker, etc.) to provide more specific type validation for these operations.

This can be deferred if the current approach aligns with the project's type-checking strategy. The bindings are otherwise correctly structured.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 30cfabb and 8f90620.

📒 Files selected for processing (8)

.github/workflows/ci-platform-generic.yml (1 hunks)
Deeploy/Targets/Generic/Bindings.py (2 hunks)
Deeploy/Targets/Generic/Parsers.py (3 hunks)
Deeploy/Targets/Generic/Templates/FloatPowTemplate.py (1 hunks)
Deeploy/Targets/Generic/Templates/FloatSqrtTemplate.py (1 hunks)
TargetLibraries/Generic/inc/kernel/Pow.h (1 hunks)
TargetLibraries/Generic/inc/kernel/Sqrt.h (1 hunks)
TargetLibraries/Generic/src/Pow_fp32.c (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

.github/workflows/ci-platform-generic.yml

🧰 Additional context used

🧬 Code graph analysis (5)

TargetLibraries/Generic/inc/kernel/Sqrt.h (2)

TargetLibraries/Generic/src/Sqrt_fp32.c (1)

Sqrt_fp32_fp32 (9-13)

DeeployTest/testUtils/dmaUtils.py (1)

size (72-73)

Deeploy/Targets/Generic/Bindings.py (3)

Deeploy/CommonExtensions/DataTypes.py (1)

float32_t (74-78)

Deeploy/DeeployTypes.py (2)

CodeTransformation (2290-2324)

NodeBinding (1512-1657)

Deeploy/AbstractDataTypes.py (1)

PointerClass (536-559)

TargetLibraries/Generic/inc/kernel/Pow.h (2)

TargetLibraries/Generic/src/Pow_fp32.c (2)

Pow_fp32_fp32_fp32 (10-17)

Pow_fp32_scalar_fp32 (19-26)

DeeployTest/testUtils/dmaUtils.py (1)

size (72-73)

Deeploy/Targets/Generic/Parsers.py (1)

Deeploy/DeeployTypes.py (7)

NetworkContext (508-1020)

NodeParser (1023-1198)

VariableBuffer (232-360)

ConstantBuffer (393-430)

parseNode (1033-1048)

inputs (2503-2520)

parseNodeCtxt (1051-1076)

Deeploy/Targets/Generic/Templates/FloatSqrtTemplate.py (1)

Deeploy/Targets/Generic/Templates/FloatPowTemplate.py (1)

alignToContext (9-34)

🪛 Ruff (0.14.6)

Deeploy/Targets/Generic/Templates/FloatPowTemplate.py

14-14: Local variable data_out is assigned to but never used

Remove assignment to unused variable data_out

(F841)

Deeploy/Targets/Generic/Parsers.py

1978-1978: Unused method argument: channels_first

(ARG002)

1995-1996: Prefer TypeError exception for invalid type

(TRY004)

1995-1996: Avoid specifying long messages outside the exception class

(TRY003)

2799-2799: Unused method argument: channels_first

(ARG002)

Deeploy/Targets/Generic/Templates/FloatSqrtTemplate.py

13-13: Local variable data_out is assigned to but never used

Remove assignment to unused variable data_out

(F841)

🔇 Additional comments (6)

TargetLibraries/Generic/inc/kernel/Sqrt.h (1)

20-20: LGTM!

The function signature is correct for an element-wise square root operation. The naming convention follows the pattern seen in other kernels and the parameters are appropriate.

TargetLibraries/Generic/inc/kernel/Pow.h (1)

16-24: LGTM!

Both function signatures correctly use float32_t for the exponent parameter(s), which allows the kernels to support general floating-point exponents via powf. The const and restrict qualifiers are appropriate.

TargetLibraries/Generic/src/Pow_fp32.c (1)

10-26: LGTM!

Both kernel implementations correctly use powf which supports general floating-point exponents. The array-based and scalar-based variants are implemented appropriately for broadcasting scenarios.
Deeploy/Targets/Generic/Parsers.py (2)
2788-2808: LGTM!

The SqrtParser implementation is straightforward and correct for a unary square root operation. It properly extracts the input/output tensors and computes the size.

Note: The channels_first parameter is unused (flagged by static analysis), but this is likely required by the NodeParser interface.

1990-1996: Incorrect exponent handling: casting to int loses precision and enforcing constants limits functionality.

There are two critical issues here:

Integer casting loses precision: Line 1991 casts the exponent to int, but the C kernel Pow_fp32_fp32_fp32 and Pow_fp32_scalar_fp32 use powf which supports floating-point exponents. For example, an exponent of 2.5 would be silently truncated to 2, producing incorrect results.

Constant enforcement is too restrictive: Lines 1994-1996 reject non-constant (variable tensor) exponents, but this unnecessarily limits the operator's functionality. Per past review feedback and the ONNX Pow specification, variable exponents should be supported.

Apply this diff to support float exponents and remove constant enforcement:
     # Extract exponent value from the constant tensor
     if isinstance(exponent_tensor, ConstantBuffer):
-        exp_value = int(exponent_tensor.values.flatten()[0])
-        self.operatorRepresentation['exponent_value'] = exp_value
-    else:
-        # Tensor exponent not supported
-        raise ValueError(f"Node {node.name}: Exponent must be a constant. "
-                         f"Variable tensor exponents are not supported.")
+        exp_value = float(exponent_tensor.values.flatten()[0])
+        self.operatorRepresentation['exponent_value'] = exp_value
+    # Variable tensor exponents are now supported via the array-based kernel
Based on learnings from past reviews requesting float exponent support and removal of constant enforcement.

Likely an incorrect or invalid review comment.
Deeploy/Targets/Generic/Templates/FloatPowTemplate.py (1)

25-34: LGTM!

The scalar broadcasting logic is well-implemented. The template correctly distinguishes between scalar and array exponents, selecting the appropriate kernel (Pow_fp32_scalar_fp32 vs Pow_fp32_fp32_fp32) and constructing the proper variable reference for scalar exponents.

Deeploy/Targets/Generic/Templates/FloatPowTemplate.py

diaconuccalin

Great progress and a good idea with the scalar/non-scalar exponent differentiation! I've left a few more improvement suggestions.

Beside the comments below, don't forget to run make format, so we get all green tests + update the CHANGELOG.md file as well (it is in the root folder). You should add the information you already have in your PR description to the top of each corresponding list in the changelog file.

Deeploy/Targets/Generic/Templates/FloatPowTemplate.py

Deeploy/Targets/Generic/Templates/FloatSqrtTemplate.py

Deeploy/Targets/Generic/Parsers.py

Deeploy/Targets/Generic/Templates/FloatPowTemplate.py

diaconuccalin · 2025-12-03T18:40:33Z

TargetLibraries/Generic/src/Pow_fp32.c

+#include "DeeployBasicMath.h"
+#include <math.h>
+
+void Pow_fp32_fp32_fp32(const float32_t *__restrict__ data_in, 


Currently we only have a test for the scalar situation. Please add one for vector exponents, while keeping the scalar one (rename existent one if it makes more sense + add the new one to the CI pipeline asa well).

…te, changelog)

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

Deeploy/Targets/Generic/Templates/FloatSqrtTemplate.py (1)
13-29: Approve dynamic typing; optional cleanup for unused variable.

The template correctly uses dynamic type_width extraction (data_in._type.referencedType.typeWidth) and applies it in the kernel call (Sqrt_fp${type_width}_fp${type_width}), following best practices from learnings to avoid hardcoded types.

Optionally, you can remove the unused data_out lookup at line 17 to clean up the code, though its presence is harmless:
-        data_out = ctxt.lookup(operatorRepresentation['data_out'])
-
Deeploy/Targets/Generic/Templates/FloatPowTemplate.py (1)
13-49: LGTM! Size validation and dynamic typing correctly implemented.

The template properly:

Uses dynamic type_width extraction following best practices.

Implements scalar vs. vector dispatch logic based on exponent shape.

Validates input_size == exponent_size for non-scalar exponents (line 42-44), directly addressing previous reviewer feedback.

Optionally, you can remove the unused data_out lookup at line 18:
-        data_out = ctxt.lookup(operatorRepresentation['data_out'])
-

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8f90620 and e1785a8.

📒 Files selected for processing (9)

.github/workflows/ci-platform-generic.yml (1 hunks)
.gitignore (1 hunks)
Deeploy/Targets/Generic/Bindings.py (2 hunks)
Deeploy/Targets/Generic/Parsers.py (3 hunks)
Deeploy/Targets/Generic/Templates/FloatPowTemplate.py (1 hunks)
Deeploy/Targets/Generic/Templates/FloatSqrtTemplate.py (1 hunks)
DeeployTest/Tests/testFloatPowVector/network.onnx (1 hunks)
TargetLibraries/Generic/inc/kernel/Pow.h (1 hunks)
TargetLibraries/Generic/src/Pow_fp32.c (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

.github/workflows/ci-platform-generic.yml

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2025-12-02T13:54:22.700Z

Learnt from: Xeratec
Repo: pulp-platform/Deeploy PR: 69
File: Deeploy/Targets/PULPOpen/Templates/FloatLayernormTemplate.py:36-38
Timestamp: 2025-12-02T13:54:22.700Z
Learning: In Deeploy templates (Python files in Deeploy/Targets/PULPOpen/Templates/), always use explicit bitwidth types (e.g., `float${...type.referencedType.typeWidth}_t*`) instead of hardcoded types (e.g., `float*`) to ensure type consistency with templated kernel calls.

Applied to files:

Deeploy/Targets/Generic/Templates/FloatPowTemplate.py
TargetLibraries/Generic/src/Pow_fp32.c
Deeploy/Targets/Generic/Bindings.py
Deeploy/Targets/Generic/Templates/FloatSqrtTemplate.py

🧬 Code graph analysis (3)

Deeploy/Targets/Generic/Templates/FloatPowTemplate.py (2)

Deeploy/Targets/Generic/Templates/FloatSqrtTemplate.py (1)

alignToContext (13-29)

Deeploy/AbstractDataTypes.py (2)

typeName (312-313)

typeWidth (399-400)

TargetLibraries/Generic/inc/kernel/Pow.h (1)

TargetLibraries/Generic/src/Pow_fp32.c (2)

Pow_fp32_fp32_fp32 (10-16)

Pow_fp32_scalar_fp32 (18-24)

Deeploy/Targets/Generic/Parsers.py (1)

Deeploy/DeeployTypes.py (4)

ConstantBuffer (393-430)

NetworkContext (508-1020)

VariableBuffer (232-360)

lookup (720-752)

🪛 Ruff (0.14.7)

Deeploy/Targets/Generic/Templates/FloatPowTemplate.py

18-18: Local variable data_out is assigned to but never used

Remove assignment to unused variable data_out

(F841)

43-44: Avoid specifying long messages outside the exception class

(TRY003)

Deeploy/Targets/Generic/Templates/FloatSqrtTemplate.py

17-17: Local variable data_out is assigned to but never used

Remove assignment to unused variable data_out

(F841)

Deeploy/Targets/Generic/Parsers.py

1978-1978: Unused method argument: channels_first

(ARG002)

2789-2789: Unused method argument: channels_first

(ARG002)

🔇 Additional comments (7)

.gitignore (1)

27-27: LGTM!

Adding .venv/* to the ignore list is a sensible best practice for Python projects and aligns with the introduction of Python-based build tooling and tests in this PR.

Deeploy/Targets/Generic/Parsers.py (2)

1967-1991: LGTM! Exponent handling is now flexible.

The PowParser correctly treats the exponent as a general tensor input without enforcing it to be a constant, aligning with the ONNX Pow specification and previous reviewer feedback. The scalar vs. vector dispatch is appropriately delegated to the template layer.

2778-2798: LGTM! SqrtParser implementation is correct.

The parser correctly extracts input, output, and size information for the Sqrt operation. The unused channels_first parameter is acceptable as it conforms to the NodeParser.parseNodeCtxt interface signature, which is used consistently across all parsers in this file.

TargetLibraries/Generic/inc/kernel/Pow.h (1)

16-22: LGTM! Dual kernel variants provide flexibility.

The header correctly declares both vector (Pow_fp32_fp32_fp32) and scalar (Pow_fp32_scalar_fp32) exponent variants, enabling efficient dispatch based on exponent shape. The signatures use proper const-correctness and restrict qualifiers.

TargetLibraries/Generic/src/Pow_fp32.c (1)

10-24: LGTM! Implementations correctly use powf for full float exponent support.

Both kernel variants properly leverage powf from math.h, which supports arbitrary floating-point exponents including fractional and negative values. This ensures full ONNX Pow semantics compliance and addresses previous concerns about integer-only exponents.

Deeploy/Targets/Generic/Templates/FloatPowTemplate.py (1)

52-59: LGTM! Kernel dispatch correctly implemented.

The reference template properly dispatches between scalar and vector Pow kernels based on the is_scalar flag, with consistent use of dynamic type widths (${type_width}) throughout both branches.

Deeploy/Targets/Generic/Bindings.py (1)

121-129: LGTM! Bindings correctly wire Pow and Sqrt operations.

The new BasicPowBindings and BasicSqrtBindings properly integrate the Pow and Sqrt templates into the platform. The use of DummyChecker with appropriate float32 pointer types is consistent with similar operations in this file, and both bindings correctly reference their respective templates and apply the BasicTransformer.

diaconuccalin

Great job, good effort! :) I've looked through the most recent changes and everything seems fine, we will try to merge it as soon as possible.

lee2716 and others added 4 commits November 19, 2025 14:49

test 1

9b74876

test

5d499d4

Merge branch 'pulp-platform:devel' into rmsnorm

3499a64

add pow2_fp16,pow2_fp32,sqrt_fp16,sqrt_fp32, rmsnorm_fp32, rmsnorm_fp…

30cfabb

…16 for Generic

lee2716 requested review from Victor-Jung and Xeratec as code owners November 25, 2025 11:33

coderabbitai bot reviewed Nov 25, 2025

View reviewed changes

Deeploy/Targets/Generic/Parsers.py Outdated Show resolved Hide resolved

Deeploy/Targets/Generic/Templates/FloatPowTemplate.py Show resolved Hide resolved

TargetLibraries/Generic/src/Pow_fp32.c Show resolved Hide resolved

lee2716 marked this pull request as draft November 25, 2025 17:09

lee2716 added 2 commits November 27, 2025 11:18

Remove FP16 tests due to lack of native support

360519c

The current implementation does not support true FP16 arithmetic. Instead, data is cast to FP32 internally. These tests are being removed to avoid misleading results until native half-precision support is implemented.

Removing FP16 tests in CI

fee8470

diaconuccalin requested changes Nov 28, 2025

View reviewed changes

lee2716 marked this pull request as ready for review November 28, 2025 20:33

lee2716 marked this pull request as draft November 28, 2025 20:35

coderabbitai bot reviewed Nov 28, 2025

View reviewed changes

recover the teseRMSNorm and updated 128*128 version of testFloatRMSNorm

4834c2a

lee2716 marked this pull request as ready for review November 28, 2025 21:23

lee2716 requested a review from diaconuccalin December 1, 2025 09:54

diaconuccalin reviewed Dec 3, 2025

View reviewed changes

Deeploy/Targets/Generic/Templates/FloatPowTemplate.py Show resolved Hide resolved

diaconuccalin requested changes Dec 3, 2025

View reviewed changes

Fix: Address review comments (dynamic types, vector Pow test, CI upda…

e1785a8

…te, changelog)

coderabbitai bot reviewed Dec 4, 2025

View reviewed changes

remove the testFloatPow, which has been renamed to testFloatPowScalar

8ecf929

diaconuccalin approved these changes Dec 5, 2025

View reviewed changes

add change log

2e857bd

Rmsnorm #136

Are you sure you want to change the base?

Rmsnorm #136

Uh oh!

Conversation

lee2716 commented Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Describe the intent of your PR here.

Added

Changed

Fixed

PR Merge Checklist

Uh oh!

coderabbitai bot commented Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

diaconuccalin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

diaconuccalin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

diaconuccalin Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

diaconuccalin left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

lee2716 commented Nov 25, 2025 •

edited

Loading

coderabbitai bot commented Nov 25, 2025 •

edited

Loading